118 research outputs found

    Latent class analysis variable selection

    Get PDF
    We propose a method for selecting variables in latent class analysis, which is the most common model-based clustering method for discrete data. The method assesses a variable's usefulness for clustering by comparing two models, given the clustering variables already selected. In one model the variable contributes information about cluster allocation beyond that contained in the already selected variables, and in the other model it does not. A headlong search algorithm is used to explore the model space and select clustering variables. In simulated datasets we found that the method selected the correct clustering variables, and also led to improvements in classification performance and in accuracy of the choice of the number of classes. In two real datasets, our method discovered the same group structure with fewer variables. In a dataset from the International HapMap Project consisting of 639 single nucleotide polymorphisms (SNPs) from 210 members of different groups, our method discovered the same group structure with a much smaller number of SNP

    Structurama: Bayesian Inference of Population Structure

    Get PDF
    Structurama is a program for inferring population structure. Specifically, the program calculates the posterior probability of assigning individuals to different populations. The program takes as input a file containing the allelic information at some number of loci sampled from a collection of individuals. After reading a data file into computer memory, Structurama uses a Gibbs algorithm to sample assignments of individuals to populations. The program implements four different models: The number of populations can be considered fixed or a random variable with a Dirichlet process prior; moreover, the genotypes of the individuals in the analysis can be considered to come from a single population (no admixture) or as coming from several different populations (admixture). The output is a file of partitions of individuals to populations that were sampled by the Markov chain Monte Carlo algorithm. The partitions are sampled in proportion to their posterior probabilities. The program implements a number of ways to summarize the sampled partitions, including calculation of the ‘mean’ partition—a partition of the individuals to populations that minimizes the squared distance to the sampled partitions

    Joint modeling of longitudinal outcomes and survival using latent growth modeling approach in a mesothelioma trial

    Get PDF
    Joint modeling of longitudinal and survival data can provide more efficient and less biased estimates of treatment effects through accounting for the associations between these two data types. Sponsors of oncology clinical trials routinely and increasingly include patient-reported outcome (PRO) instruments to evaluate the effect of treatment on symptoms, functioning, and quality of life. Known publications of these trials typically do not include jointly modeled analyses and results. We formulated several joint models based on a latent growth model for longitudinal PRO data and a Cox proportional hazards model for survival data. The longitudinal and survival components were linked through either a latent growth trajectory or shared random effects. We applied these models to data from a randomized phase III oncology clinical trial in mesothelioma. We compared the results derived under different model specifications and showed that the use of joint modeling may result in improved estimates of the overall treatment effect

    Mixture of latent trait analyzers for model-based clustering of categorical data

    Get PDF
    Model-based clustering methods for continuous data are well established and commonly used in a wide range of applications. However, model-based clustering methods for categorical data are less standard. Latent class analysis is a commonly used method for model-based clustering of binary data and/or categorical data, but due to an assumed local independence structure there may not be a correspondence between the estimated latent classes and groups in the population of interest. The mixture of latent trait analyzers model extends latent class analysis by assuming a model for the categorical response variables that depends on both a categorical latent class and a continuous latent trait variable; the discrete latent class accommodates group structure and the continuous latent trait accommodates dependence within these groups. Fitting the mixture of latent trait analyzers model is potentially difficult because the likelihood function involves an integral that cannot be evaluated analytically. We develop a variational approach for fitting the mixture of latent trait models and this provides an efficient model fitting strategy. The mixture of latent trait analyzers model is demonstrated on the analysis of data from the National Long Term Care Survey (NLTCS) and voting in the U.S. Congress. The model is shown to yield intuitive clustering results and it gives a much better fit than either latent class analysis or latent trait analysis alone
    • 

    corecore